Grammar Checker Features for Author Identification and Author Profiling Notebook for PAN at CLEF 2013
نویسنده
چکیده
Our work on author identification and author profiling is based on the question: Can the number and the types of grammatical errors serve as indicators for a specific author or a group of people? In order to detect the grammatical errors we base our approach on the output of the open-source library LanguageTool. In the case of the author identification we transform the problem into a statistical test, where an unknown document is written by another author when the distribution of grammatical errors deviated from documents of a reference corpus. For author profiling we implemented an instance based classification approach, namely a k-NN classifier, in combination with a Language Model where a text is assigned to a specific age or gender group where the according reference corpus contains the closest match. In the evaluation we found that for both scenarios grammatical errors do perform better than the baseline and do capture an aspect of a writing style, which is not contained in more traditional features, like stylometric features or word n-grams.
منابع مشابه
Readability for Author Profiling? Notebook for PAN at CLEF 2013
This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.
متن کاملStyle-based Distance Features for Author Verification Notebook for PAN at CLEF 2013
In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.
متن کاملStyle-based Distance Features for Author Profiling Notebook for PAN at CLEF 2013
In this paper we present the approach we took in our participation to the PAN 2013 Author Identification task. It relies on a complex process to select the features which represent the author’s writing, using potentially multiple statistics and distance measures computed from the training set.
متن کاملEnsemble-based Classification for Author Profiling Using Various Features Notebook for PAN at CLEF 2013
This paper summarize our approach to author profiling task – a part of evaluation lab PAN’13. We have used ensemble-based classification on large features set. All the features are roughly described and experimental section provides evaluation of different methods and classification approaches.
متن کاملUsing Simple Content Features for the Author Profiling Task Notebook for PAN at CLEF 2013
This paper describes the methods we have employed to solve the author profiling task at PAN-2013. Our goal was to use simple features to identify the age group and the gender of the author of a given text. We introduce the features, detail how the classifiers were trained, and how the experiments were run.
متن کامل